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ABSTRACT 

The  prime  factor  FFT  (PF  FFT),  developed  by 
Kolba  and  Parks,  makes  use  of  recent  computational 
complexity  results  by  Winograd  to  compute  the  DFT 
with  a  fewer  number  of  multiplications  than  that 
required  by  the  FFT.  Patterson  and  McClellan  have 
derived  an  expression  for  the  MSE  in  the  PF  FFT 
assuming  finite  precision  fixed  point  arithmetic. 

In  this  paper  we  derive  a  bound  on  the  MSE  in  the 
PF  FFT  assuming  floating  point  arithmetic.  In  the 
course  of  the  derivation  an  expression  for  the 
actual  MSE  is  also  presented,  but  is  seen  to  be  too 
complicated  to  be  of  practical  use. 


I.  INTRODUCTION 

Fairly  recently  a  new  class  of  algorithms  has 
emerged  for  computing  the  DFT  with  a  fewer  number 
of  multiplications  than  that  required  by  the  FFT. 
rhe  first  of  these  algorithms  was  developed  by 
Vino grad  f  L , 2 ]  and  makes  use  of  his  formulation  for 
performing  convolution  with  the  minimum  number  of 
multiplications  (3).  This  algorithm  has  been 
termed  the  Winograd  Fourier  transform  algorithm 
(WFTA )  [4J.  An  unnested  version  of  the  WFTA  has 
been  proposed  by  Kolba  and  Parks  and  termed  the 
prime  factor  FFT  ( PF  FFT)  f 5 ] . 

It  is  of  interest  to  investigate  the  effects 
of  finite  register  length  in  these  new  algorithms. 
Patterson  and  McClellan  have  derived  expressions 
for  the  average  MSE  in  both  the  WFTA  and  ?F  FFT, 
assuming  a  statistical  error  model  and  fixed  point 
arithmetic  f6J.  In  this  paper  we  restrict  attention 
to  the  PF  FFT  and  derive  an  expression  for  the  MSE 
assuming  floating  point  arithmetic.  The  resulting 
expression  Is  quite  cumbersome,  but  a  bound  on  the 
MSE  is  also  derived  which  is  relatively  easy  to 
compute . 

II.  PRELIMINARIES 
A.  PF  FFT  Algorithm 

A  one  dimensional  Winograd  type  DFT  algorithm 
can  be  represented  in  matrix  notation  as 

Y  -  CDA  %  (l) 

'•'This  research  was  sponsored  through  Princeton 
Uni vers i tv  by  the  Air  Force  Office  of  Scientific 
Research.  L'SAF  under  Grant  ^^-AFOSR-75-3083  and 
through  the  University  of  Illinois  by  the  Joint 
Services  Electronics  Program. 


where  £  is  the  Input  vector  and  Y  is  the  DFT  out¬ 
put.  The  rectangular  matrices  A  and  C  correspond 
to  the  input  and  output  stages  of  the  algorithm  and 
contain  only  0  and  +1  entries.  The  matrix  D  is 
diagonal  and  contains  the  only  multiplications  in 
the  algorithm.  As  an  example,  the  3  point  DFT  may 
be  written  as  [5] 
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In  this  particular  example  A  and  C  are  square,  but 
this  is  not  generally  the  case. 

An  M-dimensional  PF  FFT  is  a  cascade  of  M  mod¬ 
ules,  each  consisting  of  several  at  the  computa¬ 
tions  given  by  (l).  The  PF  FFT  may  be  written  as 

<3> 

where  "x"  denotes  a  Kronecker  product,  and  x  and  X 
are  the  DFT  input  and  output  respectively.  it  is 

assumed  that  the  dimension  of  C  D  A  is  N  and  that 

m  m  m  m 

M 

the  dimension  of  x  is  N  with  N  *  N  and  the  N 
-  m  m 

m"l 

relatively  prime.  Thus  to  compute  the  DFT  accord¬ 
ing  to  (3),  only  N  point  DFT's  are  required, 
ra 

Fig.  I  gives  a  pictorial  representation  of  (31 
for  M“2  modules.  Each  plane  in  the  figure  is  a 
computation  as  given  by  (1).  To  date,'  algorithms 
have  been  published  for  only  four  mutually  prime 
sequence  lengths^N  .  Hence,  M*4  is  the  maximum 
number  of  modules. 

B .  Characterization  of  Floating  Point  Errors 
We  shall  be  concerned  with  binary  machines 
using  floating  point  arithmetic  with  a  double 
precision  accumulator.  Thus,  each  machine  number 

may  be  expressed  as  (signl-a-Z*5  where  the  mantissa 
'a'  Is  a  fraction  between  j  and  1  and  the  exponent 

'b'  is  an  Integer.  It  shall  be  assumed  that  B 
bits  are  used  for  the  mantissa  and  that  enough  bits 
are  alloted  to  the  exponent  to  prevent  overflow. 

If  we  let  fl(-)  denote  the  machine  number  re¬ 
sulting  from  a  real  floating  point  operation,  then 
it  is  well  known  chat  f 7 7  for  rose  machines 
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fl(x+y)  - 
fl(x-y)  - 

where  che  errors  6 ^ 


(x+y)(I+41) 

(x-y)(H-42) 


(4) 


and  S2  satisfy  -2'B/2<6i<2  B/2 


_  O 

for  rounding  and  -2  <S^<0  for  truncation. 

The  errors  4^  and  4^  are  typically  modelled  as 

random  variables,  independent  of  x  and  y  and  uni¬ 
formly  distributed  on  their  range  [7],  The  bound 
to  be  derived  in  the  next  section  will  require 
knowledge  of  only  the  mean  squared  values  of  the 
4^,  which  can  be  easily  computed.  For  example, 

assuming  rounding  we  have  for  real  addition  and 
multiplication  ,  ,-2B 

E^i} "  hr 


For  the  remainder  of  this  paper  we  shall  assume 
rounding  arithmetic. 

The  PF  FFT  is  composed  of  many  short  one  di¬ 
mensional  OFT's,  each  implemented  as  in  (1).  In 
the  next  section  we  shall  require  an  expression  for 
the  error  vector  at  Che  output  of  a  single  one 
dimensional  DFT.  Let  y  and  Y  be  the  respective 
input  and  desired  output  of  a  single  N  point  DFT. 
Then  n  T  & 

+  iC.\V  (5) 

D  T 

where  y  and  y  are  the  real  and  Imaginary  parts  of 
y  respectively.  Denote  the  actual  machine  output 
by  Y.  Then  the  error  vector  at  the  DFT  output, 
v(m)  «  Y  -  Y,  may  be  written  as 

v(m)*  Q(m)yR  +  jQ(m)yr  +  [CWJ8  +  JG(m)Yri  (6) 

where  Q(m)  is  an  error  matrix  associated  with  a 
single  N  point  DFT  of  a  real  input  array.  The  , 
error  mafrix  Q(m)  may  be  obtained  as  follows.  The 
actual  output  Y,  resulting  from  a  real  input  array, 
is  computed  by  substituting  fl(')  for  each  multi¬ 
plication  and  addition  occurlng  in  (1).  The  ith 
substitution  is  made  with  an  error  source  4^  where 

it  is  assumed  the  [4^  are  uncorrelaced.  If  desired, 
dRR(l+4i>  may  be  substituted  for  the  diagonal  ele¬ 
ments  d^k  in  D,  so  that  the  error  due  to  storage  of 

the  multiplier  coefficients  is  also  accounted  for. 
All  second  order  effects  involving  terms  of  the 
form  4^4^  are  dropped  from  Y.  The  output  Y  is  than 

subtracted,  giving  y(m)  to  first  order.  Each  ele- 

o 

ment  of  v(m)  is  a  linear  combination  of  the  y^ 

with  coefficients  (the  entries  in  Q(m))  given  by 
linear  cosiblnatlons  of  the  4^.  The  matrix  Q(m)  is 

Che  error  matrix  associated  with  an  N  point  DFT  of 

a  purely  imaginary  array.  Q(m)  has  the  same  form 
as  <3(m),  however  the  elements  In  these  two  matrices 
are  assumed  to  be  uncorrelated. since  they  arise 
from  different  multiplications.  The  last  term  in 

O 

(61  ‘.s  due  to  adding  the  DFT  of  y  to  the  DFT  of 

Jvl  In  '3).  Beth  G(m)  and  G(m)  are  diagonal  ma- 

-2B 

trices  with  each  element  having  variance  2  / 12 . 

A  fact  which  will  be  neglected  is  that  the  first 


element  in  both  G(m)  and  G(m)  is  actually  zero.R 
This  follows  since  Che  component  of  Y.  due  to  y 

1  I 

is  purely  real  and  the  component  of  Y^  due  to  jy 

is  purely  imaginary  so  that  no  error  occurs  in 
adding  these  two  components. 


III.  FLOATING  POINT  ERROR  BOUND 


It  is  desired  to  bound  the  average  MSS  (over 
all  outputs)  at  Che  PP  FFT  output.  This  error  re¬ 
sults  from  computational  errors  which  originate 
within'  each  module  in  (3)  and  then  propagate  to  the 
output.  It  is  expedient  to  conelder  the  effects  of 
the  errors  from  each  module  separately. 

It  is  somewhat  difficult  to  picture  the  PF  FFT 
for  (02.  However,  each  portion  of  the  computation, 
from  the  mth  module  to  the  output,  may  still  be 
represented  in  a  manner  similar  to  Fig.  1.  Fig.  2 
illustrates  a  particular  stage  S^(m),  which  is  the 

transformation  from  a  portion  of  the  mth  module 
input  to  the  PF  FFT  output.  The  PF  FFT  contains 
N, • • *N  of  these  stages  in  parallel.  Hence,  the 
1  m- 1 

index  l  runs  from  1  to  .  end  each  stage 

L  m- 1 

computes  only  part  of  the  PF  FFT  output. 

Consider  the  kth  N  point  DFT  in  stage  S,(m). 

®  * 

Let  the  error  at  the  jth  output  of  this  DFT.  due  to 
errors  originating  within  this  DFT,  be  defined  as 
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Then  let 
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be  the  rth  output  of  the  sth  (N^| 
in  (m)  due  to  the  Cjfc(m). 

Vs  shall  firsc  compute 
1 


•N  )  point  DFT 


which  is  the  average  MSE  at  the  PF  FFT  output  due 
to  errors  originating  within  the  mth  module.  For 
fixed  X  and  s  it  follows  from  Parseval  that 


<*wv  =  ei«.V*>i2  • 


Therefore 


®  it 


where 
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(7) 

(8) 
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is  che  average  MSE  at  the  output  of  the  rath  module 
due  to  errors  originating  within  that  raoduLe. 

•> 

An  expression  for  is  now  required  in 

terms  of  the  algorithm  parameters.  To  investigate 
the  errors  occuring  within  the  rath  module,  (3)  is 
rewritten  using  a  standard  Kronecker  product  iden¬ 
tity  giving 


DA  )xl  /I,). 

'  M  m+1  m  m  m  o-l  1 


(9) 


dv 


<Imy(Cm-lDn1-LAm-l)y---y(CIDlAl)'5- 


Similarly  X  may  be  written  as  a  product  of  M  ma¬ 
trices  times  x  where  the  ith  matrix  represents 
the  transformation  performed  by  the  ith  module. 

In  (9)  we  have  broken  the  computation  around  the 
rath  module.  The  output  x(m)  °£  this  module  may  be 
written  as 


may  be  further  simplified  since 

EixR\m>F%)F(m)xR(m))-E{xR*(m)EfF*(m)F(m>}xR(m)! 

■E(xR  (nO^y  l  x R  <  n» ) } 

m~2~  E^-  (®)1  • 

The  second  equality  follows  because  F(m)  is  a  diag 
onai  matrix  with  each  element  having  variance 
■  2B 

2  / 12 .  The  Use  term  In  (13)  Is  equal  Co  a  sim¬ 

ilar  expression  so  Chat  the  sum  of  the  Use  two 
terms  is  given  by 

-2B 

2-jj-  [E[  xR*(m)xR(m) }  +  Efx1  (a)xl(m)}l  • 

2'2b  * 

— jy  E(x  (m)x(m)} . 


x<m>  ■  fl 
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(10) 


where  xfm-i)  is  the  input  to  the  rath  module  given 
bv  x(mri)-fIMv...xIinX(Cn.lDm.lAm.1)x...x(C1DlA1),x. 


It  is  apparent  from  (6)  and  (10)  that  the  error 
€(m)  in  x(m)  may  be  expressed  as 

6(m)  *  Pfm)xR(m-l)  +  j P(ra)x* (ra- 1)  +  F(m)x*(ra) 

+  j?(m)x‘ 


where 


P(m) 

*  tv- 

••ytmAl  y  Q(ra)  Y  VlV 

x  1  3S 

•••yIi1 

(12) 

Fi'tnl 

4  rv- 

••'Vi  ■ y  w- 

...xl^33. 

Here  r-fSS  indicates  that  each  time  Q(ra)  or  G(m)  is 
repeated  in  the  Kronecker  product,  its  elements  are 
superscripted  relating  them  to  their  respective 

C  D  A  implementation.  Without  this  additional 
m  ra  m 

superscript,  the  5.  arising  from  one  implementation 

of  C  DA  would  be  assumed  to  he  identical  to  those 
ra  m  m 

arising  from  another  implementation.  However, 
since  the  Inputs  to  the  various  implementations  are 
different,  we  shall  in  fact  assume  these  errors  to 
be  uncorrelated. 

From  the  definition  of  €(m),  the  quantity 

2 

7  »m),  given  by  (8)  may  be  written  as 

(m)  *  jj  E  _  6  (m)  6  (m) | 

where  M*”  denotes  complex  conjugate  transpose. 
Substituting  from  (11)  gives 

3fcnB)  *  ^  E*.  (m-l)P  (m)  P(m)xR(ra- 1 ) 

*•  xI'*(m-l)?*(m)P(ra)xI  (m-  L)  (13) 

*  xR  fra)F  (m)Ffm)x^(m) 

-*x^  m>F'm)F('ra)j(^tn);  ? 

where  all  cross  terras  are  zero  and  have  been  omit¬ 
ted  since  P^m)  P'nO,  Ffm),  and  F(nO  are  all  uncor- I 
reLated  ard  zero-mean.  Hie  last  two  terras  In  (13)  ) 


2 

a  (m)  given  by  (13)  can  now  be  rewritten  as 
cr2(m)  -  JJ  E(  xR*(m-l)P*(m)P(m)xR(m-l) 

+  x1  (m-l)P  (m)P(m)xI  (m-1) 

,-2B  * 

+  x  (m)x(m) }  . 


(14) 


So  far.  only  errors  originating  within  the  mth 
module  have  been  considered.  We  shall  now  formu¬ 
late  an  expression  for  the  total  average  MSE  at  the 
PF  FFT  output  due  to  all  modules.  It  may  be  seen 
from  (11)  and  (12)  that  the  error  vector  at  the 
output  of  the  mth  module  depends  on  Q(m)  and  G(m). 
However,  Q(m)  and  G(m)  are  uncorrelated  with  Q( i ) 
and  G(i)  for  t  i1  m  and  hence  6(m)  is  uncorrelated 
with  §(i).  The  total  average  MSE  at  the  PF  FFT 
output  is  therefore  the  sum  of  the  average  MSE’s 
due  to  each  module.  From  (7)  the  total  average 
2 

MSE  0  is  given  by 

d2  -  i  Z  E  Z  Z  E|e*s(m>|2 
1  ra  l  v  s 
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We  may  then  substitute  (14)  for  o  (m),  however  the 
first  two  terms  in  this  quantity  will  be  difficult 
to  compute.  As  an  alternative  we  shall  now  derive 

a  bound  on  d2(m)  which  Is  much  easier  to  compute 
2 

than  a  (m)  itsalf. 

The  first  term  In  (14)  may  be  bounded  as 
follows.  Letting  T  ^  P  (m)P(m)  gives 


xR  (m-l)P  (ra)P(m)xR(ra-l) 
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where  the  second  inequality  follows^  since  T  is  non¬ 
negative  definite,  and  the  ’ast  inequality  follows 
by  Cauchy -Schwartz.  The  second  term  in  (14)  can  be 
bounded  in  an  identical  fashion  giving 

J2(m)<  i  E(  S  |2  Z  Z  E|p  (m)|2 

N  k  k  l  n 

+  I  !  x£(m-l)|2  I  Z  E|  ^(m)!2 
k  In' 


*■  z~^2~  *  (»)*(»)}.  (16) 

Consider  Z  Z  E  Ip  (m)|2  in  (16).  It  is  ap- 
Jtn 

parent  from  (12)  that  the  matrix  P(m)  contain*  the 

elements  of  Q(m)  repeated  N/N  times,  each  time 

ra 

appropriately  superscripted  as  mentioned  before. 
All  other  entries  in  P(m)  are  zeroes.  Regardless 
of  the  superscripts  we  have  that 

Z  ~  E|p.(m)|2-S  £»  E|q  (m)|2  (17) 

2-1  n»l  m  l"l  j*l  J 

Using  f!6),  (17),  and  the  fact  that  E|q.  (m)|2  * 

,  .9  1J 

E  Wij('n)|  gives 

Ei-  r  =.l* n(“-l>l2 El«k£(»>>2 

m  n*l  k*l  i»l 
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Now,  if  x  (m)  is  the  nth  output  of  the  rath  module 
n 

it  follows  from  Parseval  chat 


N,  Z  (x  (m-D|2  . 


2  lx  (m)  I 
n-l  n 


By  induction  then 


,  N  ,  N....N  ,  N  N 

»  (■>  £  £  E  j  x  ] 2  [-1-  rf*  if  Elq  x0 

n*l  m  k*t  2*1  * 

M....N  ,-2B 

+  _i _ a  2 _ i 


where  x  *  x  (0)  Is  che  nth  input  to  the  PF  FFT. 
n  n 

Finally,  substituting  this  Into  (15)  gives  Che 
desired  bound 

,  N  ,  M  „  N  X  ,  ,-2B 

S'l  2  E'x  |2  [I((-f  ^  S^l,  (m)|  2>+^)1. 

n*l  m*l  N  k*L  X*l 
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